[TRTLLM-12339][feat] enable TRTLLM cross attention backend#15345
Conversation
Signed-off-by: Guiju Zhang <guijuz@nvidia.com>
7537f51 to
46ef1af
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #54082 [ run ] triggered by Bot. Commit: |
|
PR_Github #54082 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #54328 [ run ] triggered by Bot. Commit: |
|
PR_Github #54328 [ run ] completed with state
|
|
/bot run |
|
PR_Github #54380 [ run ] triggered by Bot. Commit: |
|
PR_Github #54380 [ run ] completed with state |
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR adds cross-attention and relative position bias support end-to-end: ChangesCross-attention and Relative Position Bias
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Description
Split out the attention operator and TRTLLM attention backend changes from #13919 to reduce frequent conflicts with main and make CI validation easier for this smaller, self-contained scope.
This PR intentionally keeps the change self-contained:
thop.attentionand its nanobind signature for cross-attention and relative-attention-bias inputsNo module, executor, model, or LLM API caller changes are included in this split.
Summary by CodeRabbit